Overview

Brought to you by YData

Dataset statistics

 Dataset ADataset B
Number of variables1212
Number of observations446446
Missing cells424430
Missing cells (%)7.9%8.0%
Duplicate rows00
Duplicate rows (%)0.0%0.0%
Total size in memory45.3 KiB45.3 KiB
Average record size in memory104.0 B104.0 B

Variable types

 Dataset ADataset B
Numeric55
Categorical44
Text33

Alerts

Dataset ADataset B
Fare is highly overall correlated with PclassAlert not present in this datasetHigh correlation
Pclass is highly overall correlated with FareAlert not present in this datasetHigh correlation
Sex is highly overall correlated with SurvivedSex is highly overall correlated with SurvivedHigh correlation
Survived is highly overall correlated with SexSurvived is highly overall correlated with SexHigh correlation
Age has 86 (19.3%) missing values Age has 85 (19.1%) missing values Missing
Cabin has 337 (75.6%) missing values Cabin has 344 (77.1%) missing values Missing
PassengerId has unique values PassengerId has unique values Unique
Name has unique values Name has unique values Unique
SibSp has 308 (69.1%) zeros SibSp has 308 (69.1%) zeros Zeros
Parch has 342 (76.7%) zeros Parch has 344 (77.1%) zeros Zeros
Fare has 7 (1.6%) zeros Fare has 12 (2.7%) zeros Zeros
Alert not present in this datasetParch is highly overall correlated with SibSpHigh correlation
Alert not present in this datasetSibSp is highly overall correlated with ParchHigh correlation

Reproduction

 Dataset ADataset B
Analysis started2024-10-29 15:28:38.8922542024-10-29 15:28:42.099594
Analysis finished2024-10-29 15:28:42.0962302024-10-29 15:28:45.308858
Duration3.2 seconds3.21 seconds
Software versionydata-profiling v0.0.dev0ydata-profiling v0.0.dev0
Download configurationconfig.jsonconfig.json

Variables

PassengerId
Real number (ℝ)

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean437.9574441.18834
 Dataset ADataset B
Minimum21
Maximum890891
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-10-29T15:28:45.443894image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum21
5-th percentile41.7551.25
Q1221.25216.25
median437.5436.5
Q3651.75676.75
95-th percentile849.5837.25
Maximum890891
Range888890
Interquartile range (IQR)430.5460.5

Descriptive statistics

 Dataset ADataset B
Standard deviation255.59131256.13092
Coefficient of variation (CV)0.583598560.5805478
Kurtosis-1.1571442-1.2124609
Mean437.9574441.18834
Median Absolute Deviation (MAD)216228.5
Skewness0.0596075370.055496879
Sum195329196770
Variance65326.91765603.048
MonotonicityNot monotonicNot monotonic
2024-10-29T15:28:45.770657image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
789 1
 
0.2%
150 1
 
0.2%
307 1
 
0.2%
503 1
 
0.2%
282 1
 
0.2%
485 1
 
0.2%
230 1
 
0.2%
663 1
 
0.2%
436 1
 
0.2%
728 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
814 1
 
0.2%
264 1
 
0.2%
687 1
 
0.2%
499 1
 
0.2%
59 1
 
0.2%
812 1
 
0.2%
746 1
 
0.2%
492 1
 
0.2%
131 1
 
0.2%
838 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
2 1
0.2%
6 1
0.2%
8 1
0.2%
9 1
0.2%
17 1
0.2%
18 1
0.2%
19 1
0.2%
23 1
0.2%
24 1
0.2%
25 1
0.2%
ValueCountFrequency (%)
1 1
0.2%
7 1
0.2%
11 1
0.2%
12 1
0.2%
13 1
0.2%
15 1
0.2%
16 1
0.2%
21 1
0.2%
24 1
0.2%
25 1
0.2%
ValueCountFrequency (%)
1 1
0.2%
7 1
0.2%
11 1
0.2%
12 1
0.2%
13 1
0.2%
15 1
0.2%
16 1
0.2%
21 1
0.2%
24 1
0.2%
25 1
0.2%
ValueCountFrequency (%)
2 1
0.2%
6 1
0.2%
8 1
0.2%
9 1
0.2%
17 1
0.2%
18 1
0.2%
19 1
0.2%
23 1
0.2%
24 1
0.2%
25 1
0.2%

Survived
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
0
268 
1
178 
0
296 
1
150 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters22
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row00
2nd row10
3rd row00
4th row01
5th row10

Common Values

ValueCountFrequency (%)
0 268
60.1%
1 178
39.9%
ValueCountFrequency (%)
0 296
66.4%
1 150
33.6%

Length

2024-10-29T15:28:45.924636image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2024-10-29T15:28:46.035038image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-10-29T15:28:46.137866image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
0 268
60.1%
1 178
39.9%
ValueCountFrequency (%)
0 296
66.4%
1 150
33.6%

Most occurring characters

ValueCountFrequency (%)
0 268
60.1%
1 178
39.9%
ValueCountFrequency (%)
0 296
66.4%
1 150
33.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 268
60.1%
1 178
39.9%
ValueCountFrequency (%)
0 296
66.4%
1 150
33.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 268
60.1%
1 178
39.9%
ValueCountFrequency (%)
0 296
66.4%
1 150
33.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 268
60.1%
1 178
39.9%
ValueCountFrequency (%)
0 296
66.4%
1 150
33.6%

Pclass
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
3
236 
1
115 
2
95 
3
252 
1
104 
2
90 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row21
2nd row13
3rd row31
4th row32
5th row13

Common Values

ValueCountFrequency (%)
3 236
52.9%
1 115
25.8%
2 95
21.3%
ValueCountFrequency (%)
3 252
56.5%
1 104
23.3%
2 90
 
20.2%

Length

2024-10-29T15:28:46.250697image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2024-10-29T15:28:46.368240image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-10-29T15:28:46.481522image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
3 236
52.9%
1 115
25.8%
2 95
21.3%
ValueCountFrequency (%)
3 252
56.5%
1 104
23.3%
2 90
 
20.2%

Most occurring characters

ValueCountFrequency (%)
3 236
52.9%
1 115
25.8%
2 95
21.3%
ValueCountFrequency (%)
3 252
56.5%
1 104
23.3%
2 90
 
20.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
3 236
52.9%
1 115
25.8%
2 95
21.3%
ValueCountFrequency (%)
3 252
56.5%
1 104
23.3%
2 90
 
20.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
3 236
52.9%
1 115
25.8%
2 95
21.3%
ValueCountFrequency (%)
3 252
56.5%
1 104
23.3%
2 90
 
20.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
3 236
52.9%
1 115
25.8%
2 95
21.3%
ValueCountFrequency (%)
3 252
56.5%
1 104
23.3%
2 90
 
20.2%

Name
['Text', 'Text']

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-10-29T15:28:46.905522image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Length

 Dataset ADataset B
Max length6761
Median length5046
Mean length26.82959626.078475
Min length1212

Characters and Unicode

 Dataset ADataset B
Total characters1196611631
Distinct characters5960
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique446446 ?
Unique (%)100.0%100.0%

Sample

 Dataset ADataset B
1st rowByles, Rev. Thomas Roussel DavidsHarrison, Mr. William
2nd rowFleming, Miss. MargaretPanula, Mr. Jaako Arnold
3rd rowO'Sullivan, Miss. Bridget MaryAllison, Mrs. Hudson J C (Bessie Waldo Daniels)
4th rowOlsson, Mr. Nils Johan GoranssonWest, Miss. Constance Mirium
5th rowBishop, Mr. Dickinson HLester, Mr. James
ValueCountFrequency (%)
mr 263
 
14.6%
miss 94
 
5.2%
mrs 61
 
3.4%
william 27
 
1.5%
john 16
 
0.9%
master 16
 
0.9%
james 15
 
0.8%
charles 14
 
0.8%
henry 14
 
0.8%
george 13
 
0.7%
Other values (889) 1263
70.3%
ValueCountFrequency (%)
mr 274
 
15.6%
miss 90
 
5.1%
mrs 50
 
2.8%
william 31
 
1.8%
master 22
 
1.2%
john 20
 
1.1%
henry 18
 
1.0%
james 15
 
0.9%
george 11
 
0.6%
mary 11
 
0.6%
Other values (882) 1220
69.2%
2024-10-29T15:28:47.566704image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1351
 
11.3%
r 984
 
8.2%
e 868
 
7.3%
a 812
 
6.8%
i 683
 
5.7%
s 654
 
5.5%
n 650
 
5.4%
M 570
 
4.8%
l 521
 
4.4%
o 482
 
4.0%
Other values (49) 4391
36.7%
ValueCountFrequency (%)
1318
 
11.3%
r 970
 
8.3%
e 838
 
7.2%
a 810
 
7.0%
n 664
 
5.7%
s 638
 
5.5%
i 628
 
5.4%
M 555
 
4.8%
l 492
 
4.2%
o 473
 
4.1%
Other values (50) 4245
36.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 11966
100.0%
ValueCountFrequency (%)
(unknown) 11631
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
1351
 
11.3%
r 984
 
8.2%
e 868
 
7.3%
a 812
 
6.8%
i 683
 
5.7%
s 654
 
5.5%
n 650
 
5.4%
M 570
 
4.8%
l 521
 
4.4%
o 482
 
4.0%
Other values (49) 4391
36.7%
ValueCountFrequency (%)
1318
 
11.3%
r 970
 
8.3%
e 838
 
7.2%
a 810
 
7.0%
n 664
 
5.7%
s 638
 
5.5%
i 628
 
5.4%
M 555
 
4.8%
l 492
 
4.2%
o 473
 
4.1%
Other values (50) 4245
36.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 11966
100.0%
ValueCountFrequency (%)
(unknown) 11631
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
1351
 
11.3%
r 984
 
8.2%
e 868
 
7.3%
a 812
 
6.8%
i 683
 
5.7%
s 654
 
5.5%
n 650
 
5.4%
M 570
 
4.8%
l 521
 
4.4%
o 482
 
4.0%
Other values (49) 4391
36.7%
ValueCountFrequency (%)
1318
 
11.3%
r 970
 
8.3%
e 838
 
7.2%
a 810
 
7.0%
n 664
 
5.7%
s 638
 
5.5%
i 628
 
5.4%
M 555
 
4.8%
l 492
 
4.2%
o 473
 
4.1%
Other values (50) 4245
36.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 11966
100.0%
ValueCountFrequency (%)
(unknown) 11631
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
1351
 
11.3%
r 984
 
8.2%
e 868
 
7.3%
a 812
 
6.8%
i 683
 
5.7%
s 654
 
5.5%
n 650
 
5.4%
M 570
 
4.8%
l 521
 
4.4%
o 482
 
4.0%
Other values (49) 4391
36.7%
ValueCountFrequency (%)
1318
 
11.3%
r 970
 
8.3%
e 838
 
7.2%
a 810
 
7.0%
n 664
 
5.7%
s 638
 
5.5%
i 628
 
5.4%
M 555
 
4.8%
l 492
 
4.2%
o 473
 
4.1%
Other values (50) 4245
36.5%

Sex
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
male
290 
female
156 
male
303 
female
143 

Length

 Dataset ADataset B
Max length66
Median length44
Mean length4.69955164.6412556
Min length44

Characters and Unicode

 Dataset ADataset B
Total characters20962070
Distinct characters55
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowmalemale
2nd rowfemalemale
3rd rowfemalefemale
4th rowmalefemale
5th rowmalemale

Common Values

ValueCountFrequency (%)
male 290
65.0%
female 156
35.0%
ValueCountFrequency (%)
male 303
67.9%
female 143
32.1%

Length

2024-10-29T15:28:47.733445image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2024-10-29T15:28:47.856777image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-10-29T15:28:47.963789image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
male 290
65.0%
female 156
35.0%
ValueCountFrequency (%)
male 303
67.9%
female 143
32.1%

Most occurring characters

ValueCountFrequency (%)
e 602
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 156
 
7.4%
ValueCountFrequency (%)
e 589
28.5%
m 446
21.5%
a 446
21.5%
l 446
21.5%
f 143
 
6.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 2096
100.0%
ValueCountFrequency (%)
(unknown) 2070
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 602
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 156
 
7.4%
ValueCountFrequency (%)
e 589
28.5%
m 446
21.5%
a 446
21.5%
l 446
21.5%
f 143
 
6.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 2096
100.0%
ValueCountFrequency (%)
(unknown) 2070
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 602
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 156
 
7.4%
ValueCountFrequency (%)
e 589
28.5%
m 446
21.5%
a 446
21.5%
l 446
21.5%
f 143
 
6.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 2096
100.0%
ValueCountFrequency (%)
(unknown) 2070
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 602
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 156
 
7.4%
ValueCountFrequency (%)
e 589
28.5%
m 446
21.5%
a 446
21.5%
l 446
21.5%
f 143
 
6.9%

Age
Real number (ℝ)

 Dataset ADataset B
Distinct7974
Distinct (%)21.9%20.5%
Missing8685
Missing (%)19.3%19.1%
Infinite00
Infinite (%)0.0%0.0%
Mean30.61505629.611274
 Dataset ADataset B
Minimum0.670.67
Maximum8071
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-10-29T15:28:48.124488image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum0.670.67
5-th percentile4.954
Q12121
median2928
Q34038
95-th percentile5855.5
Maximum8071
Range79.3370.33
Interquartile range (IQR)1917

Descriptive statistics

 Dataset ADataset B
Standard deviation15.0841314.256499
Coefficient of variation (CV)0.492703020.48145511
Kurtosis0.088619624-0.02156589
Mean30.61505629.611274
Median Absolute Deviation (MAD)9.58
Skewness0.400149970.27754377
Sum11021.4210689.67
Variance227.53099203.24778
MonotonicityNot monotonicNot monotonic
2024-10-29T15:28:48.338254image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
24 17
 
3.8%
22 16
 
3.6%
19 15
 
3.4%
18 14
 
3.1%
21 13
 
2.9%
36 13
 
2.9%
29 13
 
2.9%
32 11
 
2.5%
31 9
 
2.0%
16 9
 
2.0%
Other values (69) 230
51.6%
(Missing) 86
 
19.3%
ValueCountFrequency (%)
22 16
 
3.6%
28 15
 
3.4%
19 14
 
3.1%
24 14
 
3.1%
21 13
 
2.9%
30 12
 
2.7%
35 12
 
2.7%
25 12
 
2.7%
18 11
 
2.5%
36 11
 
2.5%
Other values (64) 231
51.8%
(Missing) 85
 
19.1%
ValueCountFrequency (%)
0.67 1
 
0.2%
0.75 1
 
0.2%
1 6
1.3%
2 4
0.9%
3 2
 
0.4%
4 4
0.9%
5 3
0.7%
6 2
 
0.4%
7 2
 
0.4%
8 1
 
0.2%
ValueCountFrequency (%)
0.67 1
 
0.2%
1 5
1.1%
2 6
1.3%
3 3
0.7%
4 6
1.3%
5 3
0.7%
6 2
 
0.4%
7 3
0.7%
8 2
 
0.4%
9 2
 
0.4%
ValueCountFrequency (%)
0.67 1
 
0.2%
1 5
1.1%
2 6
1.3%
3 3
0.7%
4 6
1.3%
5 3
0.7%
6 2
 
0.4%
7 3
0.7%
8 2
 
0.4%
9 2
 
0.4%
ValueCountFrequency (%)
0.67 1
 
0.2%
0.75 1
 
0.2%
1 6
1.3%
2 4
0.9%
3 2
 
0.4%
4 4
0.9%
5 3
0.7%
6 2
 
0.4%
7 2
 
0.4%
8 1
 
0.2%

SibSp
Real number (ℝ)

 Dataset ADataset B
Distinct77
Distinct (%)1.6%1.6%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.506726460.53811659
 Dataset ADataset B
Minimum00
Maximum88
Zeros308308
Zeros (%)69.1%69.1%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-10-29T15:28:48.496554image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q311
95-th percentile33
Maximum88
Range88
Interquartile range (IQR)11

Descriptive statistics

 Dataset ADataset B
Standard deviation1.0678971.1658551
Coefficient of variation (CV)2.10744282.1665473
Kurtosis17.96235216.281306
Mean0.506726460.53811659
Median Absolute Deviation (MAD)00
Skewness3.65243253.604036
Sum226240
Variance1.14040411.359218
MonotonicityNot monotonicNot monotonic
2024-10-29T15:28:48.620894image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0 308
69.1%
1 101
 
22.6%
2 13
 
2.9%
3 11
 
2.5%
4 8
 
1.8%
8 3
 
0.7%
5 2
 
0.4%
ValueCountFrequency (%)
0 308
69.1%
1 100
 
22.4%
4 12
 
2.7%
2 12
 
2.7%
3 7
 
1.6%
8 4
 
0.9%
5 3
 
0.7%
ValueCountFrequency (%)
0 308
69.1%
1 101
 
22.6%
2 13
 
2.9%
3 11
 
2.5%
4 8
 
1.8%
5 2
 
0.4%
8 3
 
0.7%
ValueCountFrequency (%)
0 308
69.1%
1 100
 
22.4%
2 12
 
2.7%
3 7
 
1.6%
4 12
 
2.7%
5 3
 
0.7%
8 4
 
0.9%
ValueCountFrequency (%)
0 308
69.1%
1 100
 
22.4%
2 12
 
2.7%
3 7
 
1.6%
4 12
 
2.7%
5 3
 
0.7%
8 4
 
0.9%
ValueCountFrequency (%)
0 308
69.1%
1 101
 
22.6%
2 13
 
2.9%
3 11
 
2.5%
4 8
 
1.8%
5 2
 
0.4%
8 3
 
0.7%

Parch
Real number (ℝ)

 Dataset ADataset B
Distinct66
Distinct (%)1.3%1.3%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.360986550.35201794
 Dataset ADataset B
Minimum00
Maximum55
Zeros342344
Zeros (%)76.7%77.1%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-10-29T15:28:48.743009image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q300
95-th percentile22
Maximum55
Range55
Interquartile range (IQR)00

Descriptive statistics

 Dataset ADataset B
Standard deviation0.759776970.74612585
Coefficient of variation (CV)2.10472382.1195677
Kurtosis8.40182597.465106
Mean0.360986550.35201794
Median Absolute Deviation (MAD)00
Skewness2.5930792.5211801
Sum161157
Variance0.577261050.55670378
MonotonicityNot monotonicNot monotonic
2024-10-29T15:28:48.863410image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
0 342
76.7%
1 59
 
13.2%
2 39
 
8.7%
3 2
 
0.4%
5 2
 
0.4%
4 2
 
0.4%
ValueCountFrequency (%)
0 344
77.1%
1 59
 
13.2%
2 36
 
8.1%
3 3
 
0.7%
4 3
 
0.7%
5 1
 
0.2%
ValueCountFrequency (%)
0 342
76.7%
1 59
 
13.2%
2 39
 
8.7%
3 2
 
0.4%
4 2
 
0.4%
5 2
 
0.4%
ValueCountFrequency (%)
0 344
77.1%
1 59
 
13.2%
2 36
 
8.1%
3 3
 
0.7%
4 3
 
0.7%
5 1
 
0.2%
ValueCountFrequency (%)
0 344
77.1%
1 59
 
13.2%
2 36
 
8.1%
3 3
 
0.7%
4 3
 
0.7%
5 1
 
0.2%
ValueCountFrequency (%)
0 342
76.7%
1 59
 
13.2%
2 39
 
8.7%
3 2
 
0.4%
4 2
 
0.4%
5 2
 
0.4%

Ticket
['Text', 'Text']

 Dataset ADataset B
Distinct386387
Distinct (%)86.5%86.8%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-10-29T15:28:49.385676image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1818
Median length1717
Mean length6.86547096.7511211
Min length44

Characters and Unicode

 Dataset ADataset B
Total characters30623011
Distinct characters3535
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique342347 ?
Unique (%)76.7%77.8%

Sample

 Dataset ADataset B
1st row244310112059
2nd row174213101295
3rd row330909113781
4th row347464C.A. 34651
5th row11967A/4 48871
ValueCountFrequency (%)
pc 36
 
6.3%
c.a 13
 
2.3%
a/5 9
 
1.6%
ston/o 7
 
1.2%
2 7
 
1.2%
sc/paris 6
 
1.0%
soton/oq 5
 
0.9%
ca 5
 
0.9%
4133 4
 
0.7%
14879 4
 
0.7%
Other values (406) 476
83.2%
ValueCountFrequency (%)
pc 25
 
4.4%
c.a 15
 
2.6%
a/5 8
 
1.4%
ca 7
 
1.2%
ston/o 7
 
1.2%
2 7
 
1.2%
1601 5
 
0.9%
3101295 5
 
0.9%
a/4 4
 
0.7%
347082 4
 
0.7%
Other values (410) 480
84.7%
2024-10-29T15:28:50.223122image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3 387
12.6%
1 362
11.8%
2 292
9.5%
7 249
 
8.1%
4 222
 
7.3%
6 210
 
6.9%
5 202
 
6.6%
0 195
 
6.4%
9 174
 
5.7%
8 130
 
4.2%
Other values (25) 639
20.9%
ValueCountFrequency (%)
3 384
12.8%
1 329
10.9%
2 314
10.4%
7 246
8.2%
4 228
 
7.6%
0 202
 
6.7%
6 201
 
6.7%
5 191
 
6.3%
9 165
 
5.5%
8 145
 
4.8%
Other values (25) 606
20.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 3062
100.0%
ValueCountFrequency (%)
(unknown) 3011
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
3 387
12.6%
1 362
11.8%
2 292
9.5%
7 249
 
8.1%
4 222
 
7.3%
6 210
 
6.9%
5 202
 
6.6%
0 195
 
6.4%
9 174
 
5.7%
8 130
 
4.2%
Other values (25) 639
20.9%
ValueCountFrequency (%)
3 384
12.8%
1 329
10.9%
2 314
10.4%
7 246
8.2%
4 228
 
7.6%
0 202
 
6.7%
6 201
 
6.7%
5 191
 
6.3%
9 165
 
5.5%
8 145
 
4.8%
Other values (25) 606
20.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 3062
100.0%
ValueCountFrequency (%)
(unknown) 3011
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
3 387
12.6%
1 362
11.8%
2 292
9.5%
7 249
 
8.1%
4 222
 
7.3%
6 210
 
6.9%
5 202
 
6.6%
0 195
 
6.4%
9 174
 
5.7%
8 130
 
4.2%
Other values (25) 639
20.9%
ValueCountFrequency (%)
3 384
12.8%
1 329
10.9%
2 314
10.4%
7 246
8.2%
4 228
 
7.6%
0 202
 
6.7%
6 201
 
6.7%
5 191
 
6.3%
9 165
 
5.5%
8 145
 
4.8%
Other values (25) 606
20.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 3062
100.0%
ValueCountFrequency (%)
(unknown) 3011
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
3 387
12.6%
1 362
11.8%
2 292
9.5%
7 249
 
8.1%
4 222
 
7.3%
6 210
 
6.9%
5 202
 
6.6%
0 195
 
6.4%
9 174
 
5.7%
8 130
 
4.2%
Other values (25) 639
20.9%
ValueCountFrequency (%)
3 384
12.8%
1 329
10.9%
2 314
10.4%
7 246
8.2%
4 228
 
7.6%
0 202
 
6.7%
6 201
 
6.7%
5 191
 
6.3%
9 165
 
5.5%
8 145
 
4.8%
Other values (25) 606
20.1%

Fare
Real number (ℝ)

 Dataset ADataset B
Distinct181189
Distinct (%)40.6%42.4%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean31.61962831.456362
 Dataset ADataset B
Minimum00
Maximum263512.3292
Zeros712
Zeros (%)1.6%2.7%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-10-29T15:28:50.418072image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile7.22927.0542
Q17.9257.8958
median14.4562513.5
Q331.387531.275
95-th percentile108.28125110.8833
Maximum263512.3292
Range263512.3292
Interquartile range (IQR)23.462523.3792

Descriptive statistics

 Dataset ADataset B
Standard deviation41.1265648.13367
Coefficient of variation (CV)1.30066551.5301728
Kurtosis11.64812529.223275
Mean31.61962831.456362
Median Absolute Deviation (MAD)6.729156.2708
Skewness3.04036624.4782418
Sum14102.35414029.537
Variance1691.39392316.8502
MonotonicityNot monotonicNot monotonic
2024-10-29T15:28:50.625063image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
8.05 25
 
5.6%
13 24
 
5.4%
7.8958 21
 
4.7%
7.75 16
 
3.6%
10.5 14
 
3.1%
26 10
 
2.2%
7.925 10
 
2.2%
7.775 8
 
1.8%
7.25 8
 
1.8%
0 7
 
1.6%
Other values (171) 303
67.9%
ValueCountFrequency (%)
8.05 23
 
5.2%
7.75 21
 
4.7%
7.8958 19
 
4.3%
13 17
 
3.8%
26 14
 
3.1%
0 12
 
2.7%
10.5 12
 
2.7%
7.925 11
 
2.5%
7.775 9
 
2.0%
7.2292 8
 
1.8%
Other values (179) 300
67.3%
ValueCountFrequency (%)
0 7
1.6%
6.975 1
 
0.2%
7.0458 1
 
0.2%
7.05 4
0.9%
7.0542 1
 
0.2%
7.125 1
 
0.2%
7.1417 1
 
0.2%
7.225 6
1.3%
7.2292 5
1.1%
7.25 8
1.8%
ValueCountFrequency (%)
0 12
2.7%
6.2375 1
 
0.2%
6.4375 1
 
0.2%
6.4958 1
 
0.2%
6.75 2
 
0.4%
6.95 1
 
0.2%
7.0458 1
 
0.2%
7.05 3
 
0.7%
7.0542 2
 
0.4%
7.125 1
 
0.2%
ValueCountFrequency (%)
0 12
2.7%
6.2375 1
 
0.2%
6.4375 1
 
0.2%
6.4958 1
 
0.2%
6.75 2
 
0.4%
6.95 1
 
0.2%
7.0458 1
 
0.2%
7.05 3
 
0.7%
7.0542 2
 
0.4%
7.125 1
 
0.2%
ValueCountFrequency (%)
0 7
1.6%
6.975 1
 
0.2%
7.0458 1
 
0.2%
7.05 4
0.9%
7.0542 1
 
0.2%
7.125 1
 
0.2%
7.1417 1
 
0.2%
7.225 6
1.3%
7.2292 5
1.1%
7.25 8
1.8%

Cabin
['Text', 'Text']

 Dataset ADataset B
Distinct9187
Distinct (%)83.5%85.3%
Missing337344
Missing (%)75.6%77.1%
Memory size7.0 KiB7.0 KiB
2024-10-29T15:28:51.113915image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1515
Median length33
Mean length3.52293583.745098
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters384382
Distinct characters1919
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique7575 ?
Unique (%)68.8%73.5%

Sample

 Dataset ADataset B
1st rowB49B94
2nd rowE58C22 C26
3rd rowB96 B98B22
4th rowE101E46
5th rowE67C68
ValueCountFrequency (%)
b96 3
 
2.4%
b98 3
 
2.4%
c23 3
 
2.4%
c25 3
 
2.4%
c27 3
 
2.4%
b49 2
 
1.6%
e25 2
 
1.6%
e67 2
 
1.6%
c92 2
 
1.6%
g6 2
 
1.6%
Other values (91) 100
80.0%
ValueCountFrequency (%)
f2 3
 
2.4%
g6 3
 
2.4%
c23 3
 
2.4%
c25 3
 
2.4%
c27 3
 
2.4%
f 3
 
2.4%
d36 2
 
1.6%
e101 2
 
1.6%
b22 2
 
1.6%
c22 2
 
1.6%
Other values (89) 98
79.0%
2024-10-29T15:28:51.731680image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
C 38
 
9.9%
2 34
 
8.9%
B 34
 
8.9%
3 33
 
8.6%
1 28
 
7.3%
8 26
 
6.8%
6 23
 
6.0%
4 22
 
5.7%
E 21
 
5.5%
5 21
 
5.5%
Other values (9) 104
27.1%
ValueCountFrequency (%)
2 44
11.5%
C 37
 
9.7%
6 33
 
8.6%
B 32
 
8.4%
3 29
 
7.6%
1 26
 
6.8%
22
 
5.8%
E 20
 
5.2%
4 20
 
5.2%
7 19
 
5.0%
Other values (9) 100
26.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 384
100.0%
ValueCountFrequency (%)
(unknown) 382
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
C 38
 
9.9%
2 34
 
8.9%
B 34
 
8.9%
3 33
 
8.6%
1 28
 
7.3%
8 26
 
6.8%
6 23
 
6.0%
4 22
 
5.7%
E 21
 
5.5%
5 21
 
5.5%
Other values (9) 104
27.1%
ValueCountFrequency (%)
2 44
11.5%
C 37
 
9.7%
6 33
 
8.6%
B 32
 
8.4%
3 29
 
7.6%
1 26
 
6.8%
22
 
5.8%
E 20
 
5.2%
4 20
 
5.2%
7 19
 
5.0%
Other values (9) 100
26.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 384
100.0%
ValueCountFrequency (%)
(unknown) 382
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
C 38
 
9.9%
2 34
 
8.9%
B 34
 
8.9%
3 33
 
8.6%
1 28
 
7.3%
8 26
 
6.8%
6 23
 
6.0%
4 22
 
5.7%
E 21
 
5.5%
5 21
 
5.5%
Other values (9) 104
27.1%
ValueCountFrequency (%)
2 44
11.5%
C 37
 
9.7%
6 33
 
8.6%
B 32
 
8.4%
3 29
 
7.6%
1 26
 
6.8%
22
 
5.8%
E 20
 
5.2%
4 20
 
5.2%
7 19
 
5.0%
Other values (9) 100
26.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 384
100.0%
ValueCountFrequency (%)
(unknown) 382
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
C 38
 
9.9%
2 34
 
8.9%
B 34
 
8.9%
3 33
 
8.6%
1 28
 
7.3%
8 26
 
6.8%
6 23
 
6.0%
4 22
 
5.7%
E 21
 
5.5%
5 21
 
5.5%
Other values (9) 104
27.1%
ValueCountFrequency (%)
2 44
11.5%
C 37
 
9.7%
6 33
 
8.6%
B 32
 
8.4%
3 29
 
7.6%
1 26
 
6.8%
22
 
5.8%
E 20
 
5.2%
4 20
 
5.2%
7 19
 
5.0%
Other values (9) 100
26.2%

Embarked
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing11
Missing (%)0.2%0.2%
Memory size7.0 KiB7.0 KiB
S
317 
C
86 
Q
42 
S
323 
C
83 
Q
39 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters445445
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowSS
2nd rowCS
3rd rowQS
4th rowSS
5th rowCS

Common Values

ValueCountFrequency (%)
S 317
71.1%
C 86
 
19.3%
Q 42
 
9.4%
(Missing) 1
 
0.2%
ValueCountFrequency (%)
S 323
72.4%
C 83
 
18.6%
Q 39
 
8.7%
(Missing) 1
 
0.2%

Length

2024-10-29T15:28:51.880352image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2024-10-29T15:28:51.994335image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-10-29T15:28:52.107344image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
s 317
71.2%
c 86
 
19.3%
q 42
 
9.4%
ValueCountFrequency (%)
s 323
72.6%
c 83
 
18.7%
q 39
 
8.8%

Most occurring characters

ValueCountFrequency (%)
S 317
71.2%
C 86
 
19.3%
Q 42
 
9.4%
ValueCountFrequency (%)
S 323
72.6%
C 83
 
18.7%
Q 39
 
8.8%

Most occurring categories

ValueCountFrequency (%)
(unknown) 445
100.0%
ValueCountFrequency (%)
(unknown) 445
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
S 317
71.2%
C 86
 
19.3%
Q 42
 
9.4%
ValueCountFrequency (%)
S 323
72.6%
C 83
 
18.7%
Q 39
 
8.8%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 445
100.0%
ValueCountFrequency (%)
(unknown) 445
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
S 317
71.2%
C 86
 
19.3%
Q 42
 
9.4%
ValueCountFrequency (%)
S 323
72.6%
C 83
 
18.7%
Q 39
 
8.8%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 445
100.0%
ValueCountFrequency (%)
(unknown) 445
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
S 317
71.2%
C 86
 
19.3%
Q 42
 
9.4%
ValueCountFrequency (%)
S 323
72.6%
C 83
 
18.7%
Q 39
 
8.8%

Interactions

Dataset A

2024-10-29T15:28:41.187806image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-10-29T15:28:44.420118image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-10-29T15:28:39.170461image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-10-29T15:28:42.377255image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-10-29T15:28:39.643132image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-10-29T15:28:42.853820image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-10-29T15:28:40.133966image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-10-29T15:28:43.349838image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-10-29T15:28:40.717631image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-10-29T15:28:43.956504image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-10-29T15:28:41.277565image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-10-29T15:28:44.506923image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-10-29T15:28:39.259549image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-10-29T15:28:42.462735image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-10-29T15:28:39.737542image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-10-29T15:28:42.944317image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-10-29T15:28:40.228022image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-10-29T15:28:43.555699image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-10-29T15:28:40.804695image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-10-29T15:28:44.043867image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-10-29T15:28:41.378250image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-10-29T15:28:44.603998image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-10-29T15:28:39.361461image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-10-29T15:28:42.559390image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-10-29T15:28:39.843260image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-10-29T15:28:43.067253image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-10-29T15:28:40.415996image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-10-29T15:28:43.652118image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-10-29T15:28:40.907340image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-10-29T15:28:44.140279image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-10-29T15:28:41.482881image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-10-29T15:28:44.705423image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-10-29T15:28:39.462464image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-10-29T15:28:42.661095image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-10-29T15:28:39.940671image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-10-29T15:28:43.160680image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-10-29T15:28:40.524279image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-10-29T15:28:43.762502image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-10-29T15:28:41.008204image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-10-29T15:28:44.241533image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-10-29T15:28:41.573519image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-10-29T15:28:44.796966image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-10-29T15:28:39.553706image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-10-29T15:28:42.750210image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-10-29T15:28:40.037118image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-10-29T15:28:43.254880image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-10-29T15:28:40.619683image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-10-29T15:28:43.859340image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-10-29T15:28:41.097725image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-10-29T15:28:44.329510image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Correlations

Dataset A

2024-10-29T15:28:52.195055image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-10-29T15:28:52.333323image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

AgeEmbarkedFareParchPassengerIdPclassSexSibSpSurvived
Age1.0000.1260.166-0.2570.0540.2920.190-0.1750.160
Embarked0.1261.0000.2610.0240.1290.3090.1370.0890.212
Fare0.1660.2611.0000.401-0.0170.5650.1960.4420.293
Parch-0.2570.0240.4011.000-0.0430.0000.2690.4040.170
PassengerId0.0540.129-0.017-0.0431.0000.0410.148-0.0930.089
Pclass0.2920.3090.5650.0000.0411.0000.1300.1450.355
Sex0.1900.1370.1960.2690.1480.1301.0000.1780.587
SibSp-0.1750.0890.4420.404-0.0930.1450.1781.0000.163
Survived0.1600.2120.2930.1700.0890.3550.5870.1631.000

Dataset B

AgeEmbarkedFareParchPassengerIdPclassSexSibSpSurvived
Age1.0000.0580.132-0.2280.0240.2850.055-0.2390.186
Embarked0.0581.0000.2050.0000.0000.2570.1490.0760.200
Fare0.1320.2051.0000.451-0.0390.4660.2790.4590.309
Parch-0.2280.0000.4511.0000.0040.0000.2830.5410.168
PassengerId0.0240.000-0.0390.0041.0000.0540.050-0.0460.125
Pclass0.2850.2570.4660.0000.0541.0000.1410.1430.325
Sex0.0550.1490.2790.2830.0500.1411.0000.2270.542
SibSp-0.2390.0760.4590.541-0.0460.1430.2271.0000.177
Survived0.1860.2000.3090.1680.1250.3250.5420.1771.000

Missing values

Dataset A

2024-10-29T15:28:41.709156image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset B

2024-10-29T15:28:44.931888image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset A

2024-10-29T15:28:41.904969image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset B

2024-10-29T15:28:45.125571image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset A

2024-10-29T15:28:42.036344image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Dataset B

2024-10-29T15:28:45.248646image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
14915002Byles, Rev. Thomas Roussel Davidsmale42.00024431013.0000NaNS
30630711Fleming, Miss. MargaretfemaleNaN0017421110.8833NaNC
50250303O'Sullivan, Miss. Bridget MaryfemaleNaN003309097.6292NaNQ
28128203Olsson, Mr. Nils Johan Goranssonmale28.0003474647.8542NaNS
48448511Bishop, Mr. Dickinson Hmale25.0101196791.0792B49C
22923003Lefebre, Miss. MathildefemaleNaN31413325.4667NaNS
66266301Colley, Mr. Edward Pomeroymale47.000572725.5875E58S
43543611Carter, Miss. Lucile Polkfemale14.012113760120.0000B96 B98S
72772813Mannion, Miss. MargarethfemaleNaN00368667.7375NaNQ
76876903Moran, Mr. Daniel JmaleNaN1037111024.1500NaNQ

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
26326401Harrison, Mr. Williammale40.0001120590.0000B94S
68668703Panula, Mr. Jaako Arnoldmale14.041310129539.6875NaNS
49849901Allison, Mrs. Hudson J C (Bessie Waldo Daniels)female25.012113781151.5500C22 C26S
585912West, Miss. Constance Miriumfemale5.012C.A. 3465127.7500NaNS
81181203Lester, Mr. Jamesmale39.000A/4 4887124.1500NaNS
74574601Crosby, Capt. Edward Giffordmale70.011WE/P 573571.0000B22S
49149203Windelov, Mr. Einarmale21.000SOTON/OQ 31013177.2500NaNS
13013103Drazenoic, Mr. Jozefmale33.0003492417.8958NaNC
83783803Sirota, Mr. MauricemaleNaN003920928.0500NaNS
46746801Smart, Mr. John Montgomerymale56.00011379226.5500NaNS

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
71871903McEvoy, Mr. MichaelmaleNaN003656815.5000NaNQ
949503Coxon, Mr. Danielmale59.0003645007.2500NaNS
38538602Davies, Mr. Charles Henrymale18.000S.O.C. 1487973.5000NaNS
22022113Sunderland, Mr. Victor Francismale16.000SOTON/OQ 3920898.0500NaNS
29329403Haas, Miss. Aloisiafemale24.0003492368.8500NaNS
46146203Morley, Mr. Williammale34.0003645068.0500NaNS
76076103Garfirth, Mr. JohnmaleNaN0035858514.5000NaNS
45245301Foreman, Mr. Benjamin Laventallmale30.00011305127.7500C111C
23323413Asplund, Miss. Lillian Gertrudfemale5.04234707731.3875NaNS
78878913Dean, Master. Bertram Veremale1.012C.A. 231520.5750NaNS

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
45045102West, Mr. Edwy Arthurmale36.012C.A. 3465127.7500NaNS
20020103Vande Walle, Mr. Nestor Cyrielmale28.0003457709.5000NaNS
30931011Francatelli, Miss. Laura Mabelfemale30.000PC 1748556.9292E36C
37237303Beavan, Mr. William Thomasmale19.0003239518.0500NaNS
32232312Slayter, Miss. Hilda Maryfemale30.00023481812.3500NaNQ
50250303O'Sullivan, Miss. Bridget MaryfemaleNaN003309097.6292NaNQ
44344412Reynaldo, Ms. Encarnacionfemale28.00023043413.0000NaNS
44444513Johannesen-Bratthammer, Mr. BerntmaleNaN00653068.1125NaNS
83883913Chip, Mr. Changmale32.000160156.4958NaNS
81381403Andersson, Miss. Ebba Iris Alfridafemale6.04234708231.2750NaNS

Duplicate rows

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.